Search CORE

278 research outputs found

Improving Middleware Performance with AdOC: an Adaptive Online Compression Library for Data Transfer

Author: Jeannot Emmanuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

http://csdl2.computer.org/In this article, we present the AdOC (Adaptive Online Compression) library. It is a user-level set of functions that enables data transmission with compression. The compression is performed dynamically during the transmission and the compression level is constantly adapted according to the environment. In order to ease the integration of AdOC into existing software the API is very close to the read and write UNIX system calls and respects their semantic. Moreover this library is thread-safe and is ported to many UNIXlike systems. We have tested AdOC under various conditions and with various data types. Results show that the library outperforms the POSIX read/write system calls on a broad range of networks (up to 100 Mbit LAN), whereas on Gbit Ethernet, it provides similar performance

INRIA a CCSD electronic archive server

Symbolic Mapping and Allocation for the Cholesky Factorization on NUMA machines: Results and Optimizations

Author: Jeannot Emmanuel
Publication venue: 'Academy of Traumatology'
Publication date: 01/01/2013
Field of study

International audienceWe discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread and data placement in order to achieve performance gains up to 50% compared to state-of- the-art libraries such as PLASMA or MKL

INRIA a CCSD electronic archive server

Adaptive Online Data Compression

Author: Jeannot Emmanuel
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

Quickly transmitting huge data in the context of distributed computing on wide area network can be achieved by compressing data before transmission. However, such an approach is not efficient when dealing with high-speed networks. Indeed, the time to compress a large file and to send it is greater than the time to send the uncompressed file. In this paper, we propose an algorithm that allows to overlap communications with compression and to adapt the compression ratio according to the network speed (the slower the network, the more we use efficient and slow compression algorithms). The advantage of such an adaptive algorithm is its generality and that its suitability for a large set of applications

INRIA a CCSD electronic archive server

New Dynamic Heuristics in the Client-Agent-Server Model

Author: Caniou Yves
Jeannot Emmanuel
Publication venue: HAL CCSD
Publication date: 01/04/2003
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceMCT is a widely used heuristic for scheduling tasks onto grid platforms. However, when dealing with many tasks, MCT tends to dramatically delay already mapped task completion time, while scheduling a new task. In this paper we propose heuristics based on two features: the historical trace manager that simulates the environment and the perturbation that defines the impact a new allocated task has on already mapped tasks. Our simulations and experiments on a real environment show that the proposed heuristics outperform MCT

INRIA a CCSD electronic archive server

DKPN: A Composite Dataflow/Kahn Process Networks Execution Model

Author: Arras Paul-Antoine
Fuin Didier
Jeannot Emmanuel
Thibault Samuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/02/2016
Field of study

International audienceTo address the high level of dynamism and variability in modern streaming applications (e.g. video decoding) as well as the difficulties in programming heterogeneous MPSoCs, we propose a novel execution model based upon both dataflow and Kahn process networks. This paper presents the semantics and properties of this hierarchical and parametric model, called DKPN. Parameters are classified and it is shown that hints can be derived to improve the execution. A scheduler framework and policies to back the model are also exposed. Experiments illustrate the benefits of our approach

INRIA a CCSD electronic archive server

HAL-CEA

On the complexity of task graph scheduling with transient and fail-stop failures

Author: Benoit Anne
Canon Louis-Claude
Jeannot Emmanuel
Robert Yves
Publication venue: HAL CCSD
Publication date: 17/02/2010
Field of study

This paper deals with the complexity of task graph scheduling with transient and fail-stop failures. While computing the reliability of a given schedule is easy in the absence of task replication, the problem becomes much more difﬁcult when task replication is used. Our main result is that this problem is #P'- Complete (hence at least as hard as NP-Complete problems), with both transient and fails-stop processor failures. We also study the complexity of a restricted class of schedules, where a task cannot be scheduled before all replicas of all its predecessors have completed their execution

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Affinité entre les processus, métriques et impact sur les performances : étude expérimentale

Author: Bordage Cyril
Jeannot Emmanuel
Publication venue: HAL CCSD
Publication date: 19/12/2017
Field of study

Process placement, also called topology mapping, is a well-known strategy to improve parallel program execution by reducing the communication cost between processes. It requires two inputs: the topology of the target machine and a measure of the affinity between processes. In the literature, the dominant affinity measure is the communication matrix that describes the amount of communication between processes. The goal of this paper is to study the accuracy of the communication matrix as a measure of affinity. We have done an extensive set of tests with two fat-tree machines and a 3d-torus machine to evaluate several hypotheses that are often made in the literature and to discuss their validity. First, we check the correlation between algorithmic metrics and the performance of the application. Then, we check whether a good generic process placement algorithm never degrades performance. And finally, we see whether the structure of the communication matrix can be used to predict gain.Le placement de processus en prenant en compte la topologie de la machine est unetechnique bien connue pour réduire le temps d’exécution d’un programme parallèle en diminuantle coût des communications entre les processus. Il nécessite deux entrées : la topologie de lamachine cible, et une mesure de l’affinité entre les processus. Dans la littérature, la mesured’affinité qui prédomine est la matrice de communication qui comptabilise les communicationsentre les processus. Le but de ce papier est d’étudier la pertinence de la matrice de communicationcomme mesure de l’affinité. Dans ce but, nous avons réalisé un grand nombre de tests sur unemachine de type fat-tree ainsi que sur un tore 3d, afin d’évaluer plusieurs hypothèse qui seretrouvent souvent dans la littérature et de discuter de leur validité. Pour cela, d’abord nousvérifions la corrélation entre des métriques algorithmiques et la performance de l’application.Ensuite, nous contrôlons qu’un bon algorithme de placement n’implique jamais une dégradationdes performances d’une application. Et finalement, nous étudions la structure de la matrice decommunication dans le but de voir si elle peut être utilisée dans la prédiction du gain

INRIA a CCSD electronic archive server

Improving MPI Applications Performance on Multicore Clusters with Rank Reordering

Author: Jeannot Emmanuel
Mercier Guillaume
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/09/2011
Field of study

International audienceModern hardware architectures featuring multicores and a complex memory hierarchy raise challenges that need to be addressed by parallel applications programmers. It is therefore tempting to adapt an application communication pattern to the characteristics of the underlying hardware. The MPI standard features several functions that allow the ranks of MPI processes to be reordered according to a graph attached to a newly created communicator. In this paper, we explain how the MPICH2 implementation of the MPI_Dist_graph_create function was modified to reorder the MPI process ranks to create a match between the application communication pattern and the hardware topology. The experimental results on a multicore cluster show that improvements can be achieved as long as the application communication pattern is expressed by a relevant metric

INRIA a CCSD electronic archive server

Scheduling on the Grid : Historical Trace and Dynamic Heuristics

Author: Caniou Yves
Jeannot Emmanuel
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

We present a historical trace manager and new dynamic scheduling heuristics that can be used, and are studied, in the client-agent-server model on the `grid'. These heuristics rely on the common acknowledgment of the characteristics of the tasks submitted to the agent, but also on the construction of the underlying historical trace of the different tasks submitted to each server. We study each heuristic and compare them on several metrics to an instantiation of MCT (Minimum Completion time), chosen as reference heuristic. The simulation experiments we have conducted show that they are likely to give good results when tested in a real environment

INRIA a CCSD electronic archive server